Herramientas para la inteligencia artificial
Maestría en Inteligencia Artificial Aplicada
Trabajo académico final escrito¶
Genera una aplicación de inteligencia artificial que use librerías de software libre a través de herramientas colaborativas.
Aplicar herramientas y técnicas para la recolección, análisis y presentación de datos relacionados con la Inteligencia Artificial.
Identificar oportunidades para desarrollar y dirigir proyectos tecnológicos en campos específicos de la inteligencia artificial.
Planteamiento :
Seleccionar una plataforma: Jupyter Lab / Google Colab / DeepNote
Usar dos datasets, uno que tiene origen en un CSV y otro que está en una base de dato.
Consumir la información de los datasets final a través de la librería Pana s.
- Agregar 5 columnas al dataset, en función del contexto de losdatos
- Realizar visualizaciones a trv és de:
Matplotlib (2 visualz aciones)
Bokeh (2 visul izaciones)
Pywalker (2 visualizaciones)o web.
Integrantes - Grupo 8¶
- Jairo Pillajo
- Luis Estrada
- Juan Yupangui
Punto 1: Seleccionar una plataforma¶
Seleccionamos Jupyter Lab para el presente trabajo debido a sus ventajas a la hora de exportar y compartir los resultados. Esta plataforma no solo permite crear y editar notebooks interactivos, sino que también ofrece varias opciones para la exportación de los documentos. Ademá b permite la integración con herramientas colaborativas y sistemas de control de versiones como Git, lo cual es esencial para trabajos en equios.
# Instalar librerias necesarias
!pip install pymysql
!pip install pandas
Requirement already satisfied: pymysql in c:\programdata\anaconda3\lib\site-packages (1.1.0) Requirement already satisfied: pandas in c:\programdata\anaconda3\lib\site-packages (2.1.4) Requirement already satisfied: numpy<2,>=1.23.2 in c:\programdata\anaconda3\lib\site-packages (from pandas) (1.26.4) Requirement already satisfied: python-dateutil>=2.8.2 in c:\programdata\anaconda3\lib\site-packages (from pandas) (2.8.2) Requirement already satisfied: pytz>=2020.1 in c:\programdata\anaconda3\lib\site-packages (from pandas) (2023.3.post1) Requirement already satisfied: tzdata>=2022.1 in c:\programdata\anaconda3\lib\site-packages (from pandas) (2023.3) Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Punto 2: Usar datasets¶
Leemos dos origenes de datos: EL primer origen de datos corresponde a un archivo csv (clientes) El segundo esta en una base de datos mysql (tabla clientes_noticias)
# Carga de librerías
import pandas as pd
from sqlalchemy import create_engine
#Cargar archivo csv:
cli_df = pd.read_csv('data/customer.csv', encoding='latin-1', sep=';', dtype='str')
# Imprimir la informacion de clientes
print("Cantidad de clientes: ", cli_df.shape[0])
cli_df.head(5)
Cantidad de clientes: 862
| codi_clie | tipo_ruc | cli_nom | cli_sector | cli_fono | fecha | cli_prov | cli_mail | |
|---|---|---|---|---|---|---|---|---|
| 0 | 9999999 | 01 | CONCUJIDOR JINAL | QUITO | NaN | NaN | PICHINCHA | NaN |
| 1 | 0000393 | 04 | JIRANDA IZURIITA CIRJIO ANIBAL | LAGO AGRIO | 2473475 | 27/5/2013 | SUCUMBIOS | NaN |
| 2 | 0000658 | 05 | ORIJUILA JORANTI CARLOC | SANTO DOMING | 997434424 | 9/10/2013 | SANTO DOMINGO | bespropoempoeto@yahoo.com |
| 3 | 0000683 | 05 | JARTINIZ JALDONADO JARIANA | PUERTO LOPEZ | 7968497664 | 28/10/2013 | MANABI | elesamarbonabo@yahoo.com |
| 4 | 0000979 | 04 | IL ARTICAN C-A- | PUERTO LOPEZ | 752377732 | 11/7/2014 | MANABI | beltataopalosanto@yahoo.e |
#Lectura de base de datos mysql
# Crear URI de conexión hacia MySQL
uri = "mysql+pymysql://u830706108_ia_g8:2024IA_g8*.*@193.203.166.22:3306/u830706108_ia_g8"
# Crear uri a SQLAlchemy
sql_engine = create_engine(uri)
# Senetencia SQL para consultar la data
sql = "SELECT * FROM sales"
# Cargar los datos en un DataFrame utilizando SQLAlchemy
sal_df = pd.read_sql(sql, sql_engine)
# Cerrar conexion hacia el uri
sql_engine.dispose()
# Imprimir la informacion de clientes_noticias
print("Cantidad de Ventas: ", sal_df.shape[0])
sal_df.head(5)
Cantidad de Ventas: 2058
| caja | fact_anul | fact_cred | codi_clie | codi_fact | vcr_num_fac | vcr_fecha | vcr_suman | vcr_iva | vcr_tipo | vcr_divisa | usuario | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 004 | FALSO | FALSO | 9999999 | 0000124526 | 0000001 | 2019-02-16 | 36.88 | 4.43 | E | USD | 001 |
| 1 | 004 | FALSO | FALSO | 9999999 | 0000124527 | 0000002 | 2019-02-16 | 2.09 | 0.25 | E | USD | 005 |
| 2 | 004 | FALSO | FALSO | 9999999 | 0000124538 | 0000003 | 2019-02-18 | 19.18 | 2.30 | E | USD | 005 |
| 3 | 004 | FALSO | FALSO | 9999999 | 0000124542 | 0000005 | 2019-02-18 | 1.44 | 0.17 | E | USD | 005 |
| 4 | 004 | FALSO | FALSO | 9999999 | 0000124553 | 0000013 | 2019-02-18 | 27.13 | 3.26 | E | USD | 009 |
Punto 3: Unificar DataSet¶
Unificamos el DataFrame de CLientes y clientes_noticias mediante el atributo nit
##Funciones utilizadas
# Función para asignar tipo_telefono segun la longitud del numero cli_fono
def fnt_tipo_telefono(fono):
if pd.isnull(fono) or len(str(fono)) == 0:
return "Ninguno"
elif 1 <= len(str(fono)) <= 9:
return "Convencional"
elif len(str(fono)) == 10:
return "Celular"
else:
return "OT"
# Castea a SI y NO
def fnt_cast_boolean(valor):
return {'VERDADERO': 'SI', 'FALSO': 'NO', 'SI': 'SI', 'NO': 'NO'}.get(valor)
##Tratamiento de datos de cli_df (coustumer)
from datetime import datetime
# Aplicar la función a la columna cli_fono y crear una nueva columna con los resultados
cli_df['tipo_telefono'] = cli_df['cli_fono'].apply(fnt_tipo_telefono)
cli_df['cantidad_correo'] = cli_df['cli_mail'].fillna('').str.count('@')
# Convertir la columna 'fecha' al tipo datetime
cli_df['fecha'] = pd.to_datetime(cli_df['fecha'], errors='coerce', dayfirst=True)
# Remplazar valores NaN en 'fecha' con la fecha mínima
cli_df['fecha'].fillna(cli_df['fecha'].min(), inplace=True)
# Calcular la diferencia en años entre la fecha actual y columna 'fecha'
cli_df['antiguedad'] = (datetime.today() - cli_df['fecha']).dt.days // 365
# Reemplazar los valores nulos en 'tipo_ruc' con "OT"
cli_df['tipo_ruc'].fillna("OT", inplace=True)
# Eliminar los guiones '-' de los valores en 'cli_nom'
cli_df['cli_nom'] = cli_df['cli_nom'].str.replace('-', '')
# Borrar columnas no necesarias
cli_df.drop(columns=['cli_fono','cli_mail','fecha'], inplace=True)
cli_df.head(5)
| codi_clie | tipo_ruc | cli_nom | cli_sector | cli_prov | tipo_telefono | cantidad_correo | antiguedad | |
|---|---|---|---|---|---|---|---|---|
| 0 | 9999999 | 01 | CONCUJIDOR JINAL | QUITO | PICHINCHA | Ninguno | 0 | 10 |
| 1 | 0000393 | 04 | JIRANDA IZURIITA CIRJIO ANIBAL | LAGO AGRIO | SUCUMBIOS | Convencional | 0 | 10 |
| 2 | 0000658 | 05 | ORIJUILA JORANTI CARLOC | SANTO DOMING | SANTO DOMINGO | Convencional | 1 | 10 |
| 3 | 0000683 | 05 | JARTINIZ JALDONADO JARIANA | PUERTO LOPEZ | MANABI | Celular | 1 | 10 |
| 4 | 0000979 | 04 | IL ARTICAN CA | PUERTO LOPEZ | MANABI | Convencional | 1 | 9 |
##Tratamiento de datos de sal_df (sale)
# Casteamos valores utilizando funcion
sal_df['fact_anul'] = sal_df['fact_anul'].apply(fnt_cast_boolean)
sal_df['fact_cred'] = sal_df['fact_cred'].apply(fnt_cast_boolean)
# Eliminar columnas no necesarias de sale
sal_df.drop(columns=['codi_fact','vcr_num_fac'], inplace=True)
#Mostrar primeros resultados
sal_df.head(5)
| caja | fact_anul | fact_cred | codi_clie | vcr_fecha | vcr_suman | vcr_iva | vcr_tipo | vcr_divisa | usuario | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 004 | NO | NO | 9999999 | 2019-02-16 | 36.88 | 4.43 | E | USD | 001 |
| 1 | 004 | NO | NO | 9999999 | 2019-02-16 | 2.09 | 0.25 | E | USD | 005 |
| 2 | 004 | NO | NO | 9999999 | 2019-02-18 | 19.18 | 2.30 | E | USD | 005 |
| 3 | 004 | NO | NO | 9999999 | 2019-02-18 | 1.44 | 0.17 | E | USD | 005 |
| 4 | 004 | NO | NO | 9999999 | 2019-02-18 | 27.13 | 3.26 | E | USD | 009 |
## Unificar Data Frames
# Unificar los DataFrames usando atributio codi_clie y agregamos columna origin
info_df = pd.merge(cli_df, sal_df, on='codi_clie', how='inner',indicator='origin')
# Mostrar el DataFrame unificado
print("Cantidad total: ", info_df.shape[0])
info_df.head(5)
Cantidad total: 2058
| codi_clie | tipo_ruc | cli_nom | cli_sector | cli_prov | tipo_telefono | cantidad_correo | antiguedad | caja | fact_anul | fact_cred | vcr_fecha | vcr_suman | vcr_iva | vcr_tipo | vcr_divisa | usuario | origin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 9999999 | 01 | CONCUJIDOR JINAL | QUITO | PICHINCHA | Ninguno | 0 | 10 | 004 | NO | NO | 2019-02-16 | 36.88 | 4.43 | E | USD | 001 | both |
| 1 | 9999999 | 01 | CONCUJIDOR JINAL | QUITO | PICHINCHA | Ninguno | 0 | 10 | 004 | NO | NO | 2019-02-16 | 2.09 | 0.25 | E | USD | 005 | both |
| 2 | 9999999 | 01 | CONCUJIDOR JINAL | QUITO | PICHINCHA | Ninguno | 0 | 10 | 004 | NO | NO | 2019-02-18 | 19.18 | 2.30 | E | USD | 005 | both |
| 3 | 9999999 | 01 | CONCUJIDOR JINAL | QUITO | PICHINCHA | Ninguno | 0 | 10 | 004 | NO | NO | 2019-02-18 | 1.44 | 0.17 | E | USD | 005 | both |
| 4 | 9999999 | 01 | CONCUJIDOR JINAL | QUITO | PICHINCHA | Ninguno | 0 | 10 | 004 | NO | NO | 2019-02-18 | 27.13 | 3.26 | E | USD | 009 | both |
##Agregar Columnas de calculo
# Calcular el porcentaje de vcr_iva en relación a vcr_suman
info_df['porcentaje_iva'] = ((info_df['vcr_iva'] / info_df['vcr_suman']) * 100).round(0)
# Formatear la columna como porcentaje
info_df['porcentaje_iva'] = info_df['porcentaje_iva'].map("{:.2f}%".format)
#Sumar Venta + Impuesto
info_df['total'] = info_df['vcr_suman'] + info_df['vcr_iva']
info_df.head(5)
| codi_clie | tipo_ruc | cli_nom | cli_sector | cli_prov | tipo_telefono | cantidad_correo | antiguedad | caja | fact_anul | fact_cred | vcr_fecha | vcr_suman | vcr_iva | vcr_tipo | vcr_divisa | usuario | origin | porcentaje_iva | total | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 9999999 | 01 | CONCUJIDOR JINAL | QUITO | PICHINCHA | Ninguno | 0 | 10 | 004 | NO | NO | 2019-02-16 | 36.88 | 4.43 | E | USD | 001 | both | 12.00% | 41.31 |
| 1 | 9999999 | 01 | CONCUJIDOR JINAL | QUITO | PICHINCHA | Ninguno | 0 | 10 | 004 | NO | NO | 2019-02-16 | 2.09 | 0.25 | E | USD | 005 | both | 12.00% | 2.34 |
| 2 | 9999999 | 01 | CONCUJIDOR JINAL | QUITO | PICHINCHA | Ninguno | 0 | 10 | 004 | NO | NO | 2019-02-18 | 19.18 | 2.30 | E | USD | 005 | both | 12.00% | 21.48 |
| 3 | 9999999 | 01 | CONCUJIDOR JINAL | QUITO | PICHINCHA | Ninguno | 0 | 10 | 004 | NO | NO | 2019-02-18 | 1.44 | 0.17 | E | USD | 005 | both | 12.00% | 1.61 |
| 4 | 9999999 | 01 | CONCUJIDOR JINAL | QUITO | PICHINCHA | Ninguno | 0 | 10 | 004 | NO | NO | 2019-02-18 | 27.13 | 3.26 | E | USD | 009 | both | 12.00% | 30.39 |
Punto 4: Visualizaciones¶
info_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2058 entries, 0 to 2057 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 codi_clie 2058 non-null object 1 tipo_ruc 2058 non-null object 2 cli_nom 2058 non-null object 3 cli_sector 2026 non-null object 4 cli_prov 2058 non-null object 5 tipo_telefono 2058 non-null object 6 cantidad_correo 2058 non-null int64 7 antiguedad 2058 non-null int64 8 caja 2058 non-null object 9 fact_anul 2058 non-null object 10 fact_cred 2058 non-null object 11 vcr_fecha 2058 non-null object 12 vcr_suman 2058 non-null float64 13 vcr_iva 2058 non-null float64 14 vcr_tipo 2058 non-null object 15 vcr_divisa 2058 non-null object 16 usuario 2058 non-null object 17 origin 2058 non-null category 18 porcentaje_iva 2058 non-null object 19 total 2058 non-null float64 dtypes: category(1), float64(3), int64(2), object(14) memory usage: 307.8+ KB
# Importar
!pip install pygwalker==0.4.8
Requirement already satisfied: pygwalker==0.4.8 in c:\programdata\anaconda3\lib\site-packages (0.4.8) Requirement already satisfied: appdirs in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (1.4.4) Requirement already satisfied: arrow in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (1.2.3) Requirement already satisfied: astor in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (0.8.1) Requirement already satisfied: cachetools in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (4.2.2) Requirement already satisfied: duckdb==0.10.1 in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (0.10.1) Requirement already satisfied: gw-dsl-parser==0.1.47 in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (0.1.47) Requirement already satisfied: ipython in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (8.20.0) Requirement already satisfied: ipywidgets in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (8.1.2) Requirement already satisfied: jinja2 in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (3.1.3) Requirement already satisfied: kanaries-track==0.0.4 in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (0.0.4) Requirement already satisfied: packaging in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (23.1) Requirement already satisfied: pandas in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (2.1.4) Requirement already satisfied: psutil in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (5.9.0) Requirement already satisfied: pyarrow in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (14.0.2) Requirement already satisfied: pydantic in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (1.10.12) Requirement already satisfied: pytz in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (2023.3.post1) Requirement already satisfied: requests in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (2.31.0) Requirement already satisfied: segment-analytics-python==2.2.3 in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (2.2.3) Requirement already satisfied: sqlalchemy in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (2.0.25) Requirement already satisfied: sqlglot>=19.0.0 in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (23.11.2) Requirement already satisfied: typing-extensions in c:\programdata\anaconda3\lib\site-packages (from pygwalker==0.4.8) (4.9.0) Requirement already satisfied: wasmtime==12.0.0 in c:\programdata\anaconda3\lib\site-packages (from gw-dsl-parser==0.1.47->pygwalker==0.4.8) (12.0.0) Requirement already satisfied: backoff>=2.2.1 in c:\programdata\anaconda3\lib\site-packages (from kanaries-track==0.0.4->pygwalker==0.4.8) (2.2.1) Requirement already satisfied: dateutils>=0.6.12 in c:\programdata\anaconda3\lib\site-packages (from kanaries-track==0.0.4->pygwalker==0.4.8) (0.6.12) Requirement already satisfied: monotonic~=1.5 in c:\programdata\anaconda3\lib\site-packages (from segment-analytics-python==2.2.3->pygwalker==0.4.8) (1.6) Requirement already satisfied: python-dateutil~=2.2 in c:\programdata\anaconda3\lib\site-packages (from segment-analytics-python==2.2.3->pygwalker==0.4.8) (2.8.2) Requirement already satisfied: charset-normalizer<4,>=2 in c:\programdata\anaconda3\lib\site-packages (from requests->pygwalker==0.4.8) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\programdata\anaconda3\lib\site-packages (from requests->pygwalker==0.4.8) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\programdata\anaconda3\lib\site-packages (from requests->pygwalker==0.4.8) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in c:\programdata\anaconda3\lib\site-packages (from requests->pygwalker==0.4.8) (2024.2.2) Requirement already satisfied: decorator in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (5.1.1) Requirement already satisfied: jedi>=0.16 in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (0.18.1) Requirement already satisfied: matplotlib-inline in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (0.1.6) Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (3.0.43) Requirement already satisfied: pygments>=2.4.0 in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (2.15.1) Requirement already satisfied: stack-data in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (0.2.0) Requirement already satisfied: traitlets>=5 in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (5.7.1) Requirement already satisfied: colorama in c:\programdata\anaconda3\lib\site-packages (from ipython->pygwalker==0.4.8) (0.4.6) Requirement already satisfied: comm>=0.1.3 in c:\programdata\anaconda3\lib\site-packages (from ipywidgets->pygwalker==0.4.8) (0.2.2) Requirement already satisfied: widgetsnbextension~=4.0.10 in c:\programdata\anaconda3\lib\site-packages (from ipywidgets->pygwalker==0.4.8) (4.0.10) Requirement already satisfied: jupyterlab-widgets~=3.0.10 in c:\programdata\anaconda3\lib\site-packages (from ipywidgets->pygwalker==0.4.8) (3.0.10) Requirement already satisfied: MarkupSafe>=2.0 in c:\programdata\anaconda3\lib\site-packages (from jinja2->pygwalker==0.4.8) (2.1.3) Requirement already satisfied: numpy<2,>=1.23.2 in c:\programdata\anaconda3\lib\site-packages (from pandas->pygwalker==0.4.8) (1.26.4) Requirement already satisfied: tzdata>=2022.1 in c:\programdata\anaconda3\lib\site-packages (from pandas->pygwalker==0.4.8) (2023.3) Requirement already satisfied: greenlet!=0.4.17 in c:\programdata\anaconda3\lib\site-packages (from sqlalchemy->pygwalker==0.4.8) (3.0.1) Requirement already satisfied: parso<0.9.0,>=0.8.0 in c:\programdata\anaconda3\lib\site-packages (from jedi>=0.16->ipython->pygwalker==0.4.8) (0.8.3) Requirement already satisfied: wcwidth in c:\programdata\anaconda3\lib\site-packages (from prompt-toolkit<3.1.0,>=3.0.41->ipython->pygwalker==0.4.8) (0.2.5) Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-packages (from python-dateutil~=2.2->segment-analytics-python==2.2.3->pygwalker==0.4.8) (1.16.0) Requirement already satisfied: executing in c:\programdata\anaconda3\lib\site-packages (from stack-data->ipython->pygwalker==0.4.8) (0.8.3) Requirement already satisfied: asttokens in c:\programdata\anaconda3\lib\site-packages (from stack-data->ipython->pygwalker==0.4.8) (2.0.5) Requirement already satisfied: pure-eval in c:\programdata\anaconda3\lib\site-packages (from stack-data->ipython->pygwalker==0.4.8) (0.2.2)
# Importar libreria pygwalker
import pygwalker as pyg
Punto 4.1: Pywalker¶
# Graficar Antiguedad de de ser cliente
vis_spec = r"""{"config":[{"config":{"defaultAggregated":true,"geoms":["line"],"coordSystem":"generic","limit":-1,"timezoneDisplayOffset":0},"encodings":{"dimensions":[{"fid":"codi_clie","name":"codi_clie","basename":"codi_clie","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"tipo_ruc","name":"tipo_ruc","basename":"tipo_ruc","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cli_nom","name":"cli_nom","basename":"cli_nom","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cli_sector","name":"cli_sector","basename":"cli_sector","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cli_prov","name":"cli_prov","basename":"cli_prov","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"tipo_telefono","name":"tipo_telefono","basename":"tipo_telefono","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cantidad_correo","name":"cantidad_correo","basename":"cantidad_correo","semanticType":"quantitative","analyticType":"dimension","offset":0},{"fid":"antiguedad","name":"antiguedad","basename":"antiguedad","semanticType":"quantitative","analyticType":"dimension","offset":0},{"fid":"gw_mea_key_fid","name":"Measure names","analyticType":"dimension","semanticType":"nominal"}],"measures":[{"fid":"gw_count_fid","name":"Row count","analyticType":"measure","semanticType":"quantitative","aggName":"sum","computed":true,"expression":{"op":"one","params":[],"as":"gw_count_fid"}},{"fid":"gw_mea_val_fid","name":"Measure values","analyticType":"measure","semanticType":"quantitative","aggName":"sum"}],"rows":[{"fid":"gw_count_fid","name":"Row count","analyticType":"measure","semanticType":"quantitative","aggName":"sum","computed":true,"expression":{"op":"one","params":[],"as":"gw_count_fid"}}],"columns":[{"fid":"antiguedad","name":"antiguedad","basename":"antiguedad","semanticType":"quantitative","analyticType":"dimension","offset":0}],"color":[],"opacity":[],"size":[],"shape":[],"radius":[],"theta":[],"longitude":[],"latitude":[],"geoId":[],"details":[],"filters":[],"text":[]},"layout":{"showActions":false,"showTableSummary":false,"stack":"stack","interactiveScale":false,"zeroScale":true,"size":{"mode":"full","width":320,"height":200},"format":{},"geoKey":"name","resolve":{"x":false,"y":false,"color":false,"opacity":false,"shape":false,"size":false}},"visId":"gw_1XvG","name":"Antiguedad de Clientes"}],"chart_map":{},"workflow_list":[{"workflow":[{"type":"transform","transform":[{"key":"gw_count_fid","expression":{"op":"one","params":[],"as":"gw_count_fid"}}]},{"type":"view","query":[{"op":"aggregate","groupBy":["antiguedad"],"measures":[{"field":"gw_count_fid","agg":"sum","asFieldKey":"gw_count_fid_sum"}]}]}]}],"version":"0.4.8"}"""
pyg.walk(cli_df, spec=vis_spec)
WARNING: parse invoke code failed, This may affect feature of export code.
Box(children=(HTML(value='<div id="ifr-pyg-00061925094bf264QzsqGZ1Ygap4fj8E" style="height: auto">\n <head>…
<pygwalker.api.pygwalker.PygWalker at 0x156f38e5390>
# Cantidad de clientes por provincia
vis_spec = r"""{"config":[{"config":{"defaultAggregated":true,"geoms":["bar"],"coordSystem":"generic","limit":-1,"timezoneDisplayOffset":0,"folds":["gw_count_fid"]},"encodings":{"dimensions":[{"fid":"codi_clie","name":"codi_clie","basename":"codi_clie","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"tipo_ruc","name":"tipo_ruc","basename":"tipo_ruc","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cli_nom","name":"cli_nom","basename":"cli_nom","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cli_sector","name":"cli_sector","basename":"cli_sector","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cli_prov","name":"cli_prov","basename":"cli_prov","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"tipo_telefono","name":"tipo_telefono","basename":"tipo_telefono","semanticType":"nominal","analyticType":"dimension","offset":0},{"fid":"cantidad_correo","name":"cantidad_correo","basename":"cantidad_correo","semanticType":"quantitative","analyticType":"dimension","offset":0},{"fid":"antiguedad","name":"antiguedad","basename":"antiguedad","semanticType":"quantitative","analyticType":"dimension","offset":0},{"fid":"gw_mea_key_fid","name":"Measure names","analyticType":"dimension","semanticType":"nominal"}],"measures":[{"fid":"gw_count_fid","name":"Row count","analyticType":"measure","semanticType":"quantitative","aggName":"sum","computed":true,"expression":{"op":"one","params":[],"as":"gw_count_fid"}},{"fid":"gw_mea_val_fid","name":"Measure values","analyticType":"measure","semanticType":"quantitative","aggName":"sum"}],"rows":[{"fid":"gw_count_fid","name":"Row count","analyticType":"measure","semanticType":"quantitative","aggName":"sum","computed":true,"expression":{"op":"one","params":[],"as":"gw_count_fid"}}],"columns":[{"fid":"cli_prov","name":"cli_prov","basename":"cli_prov","semanticType":"nominal","analyticType":"dimension","offset":0}],"color":[],"opacity":[],"size":[],"shape":[],"radius":[],"theta":[],"longitude":[],"latitude":[],"geoId":[],"details":[{"fid":"gw_mea_key_fid","name":"Measure names","analyticType":"dimension","semanticType":"nominal"}],"filters":[],"text":[]},"layout":{"showActions":true,"showTableSummary":false,"stack":"stack","interactiveScale":true,"zeroScale":true,"size":{"mode":"full","width":320,"height":200},"format":{},"geoKey":"name","resolve":{"x":false,"y":false,"color":false,"opacity":false,"shape":false,"size":false}},"visId":"gw_UmV1","name":"Clientes Por Provincia"}],"chart_map":{},"workflow_list":[{"workflow":[{"type":"transform","transform":[{"key":"gw_count_fid","expression":{"op":"one","params":[],"as":"gw_count_fid"}}]},{"type":"view","query":[{"op":"aggregate","groupBy":["cli_prov"],"measures":[{"field":"gw_count_fid","agg":"sum","asFieldKey":"gw_count_fid_sum"}]}]}]}],"version":"0.4.8"}"""
pyg.walk(cli_df, spec=vis_spec)
WARNING: parse invoke code failed, This may affect feature of export code.
Box(children=(HTML(value='<div id="ifr-pyg-000619250959fdeaKkQ1GIzn3dUfegTB" style="height: auto">\n <head>…
<pygwalker.api.pygwalker.PygWalker at 0x156f35a0d90>
Punto 4.2: Matplotlib¶
import matplotlib.pyplot as plt
# Contar el número de clientes por provincia
clientes_por_provincia = info_df['cli_prov'].value_counts()
# Calcular el total de ventas por provincia
total_ventas_por_provincia = info_df.groupby('cli_prov')['vcr_suman'].sum()
# Crear el gráfico de dispersión
plt.figure(figsize=(10, 8))
plt.scatter(clientes_por_provincia, total_ventas_por_provincia, alpha=0.5)
plt.title('Relación entre Número de Clientes y Total de Ventas por Provincia')
plt.xlabel('Número de Clientes')
plt.ylabel('Total de Ventas')
plt.grid(True)
plt.show()
# Agrupar por usuario y caja, y contar el número de ventas
ventas_por_usuario_caja = info_df.groupby(['usuario', 'caja']).size().unstack(fill_value=0)
# Crear el gráfico de barras
ventas_por_usuario_caja.plot(kind='bar', stacked=True, figsize=(10, 6))
# Configuración del título y etiquetas de los ejes
plt.title('Cantidad de Ventas por Usuario y Caja')
plt.xlabel('Usuario')
plt.ylabel('Cantidad de Ventas')
# Mostrar la leyenda
plt.legend(title='Caja')
# Mostrar el gráfico
plt.show()
Punto 4.3: bokeh¶
# Instalar bokeh
!pip install bokeh
Requirement already satisfied: bokeh in c:\programdata\anaconda3\lib\site-packages (3.3.4) Requirement already satisfied: Jinja2>=2.9 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (3.1.3) Requirement already satisfied: contourpy>=1 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (1.2.0) Requirement already satisfied: numpy>=1.16 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (1.26.4) Requirement already satisfied: packaging>=16.8 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (23.1) Requirement already satisfied: pandas>=1.2 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (2.1.4) Requirement already satisfied: pillow>=7.1.0 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (10.2.0) Requirement already satisfied: PyYAML>=3.10 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (6.0.1) Requirement already satisfied: tornado>=5.1 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (6.3.3) Requirement already satisfied: xyzservices>=2021.09.1 in c:\programdata\anaconda3\lib\site-packages (from bokeh) (2022.9.0) Requirement already satisfied: MarkupSafe>=2.0 in c:\programdata\anaconda3\lib\site-packages (from Jinja2>=2.9->bokeh) (2.1.3) Requirement already satisfied: python-dateutil>=2.8.2 in c:\programdata\anaconda3\lib\site-packages (from pandas>=1.2->bokeh) (2.8.2) Requirement already satisfied: pytz>=2020.1 in c:\programdata\anaconda3\lib\site-packages (from pandas>=1.2->bokeh) (2023.3.post1) Requirement already satisfied: tzdata>=2022.1 in c:\programdata\anaconda3\lib\site-packages (from pandas>=1.2->bokeh) (2023.3) Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-packages (from python-dateutil>=2.8.2->pandas>=1.2->bokeh) (1.16.0)
# Importar las bibliotecas necesarias
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.transform import factor_cmap
from bokeh.palettes import Spectral6
## bokeh ->Total Ventas por mes
# Configura Bokeh para mostrar gráficos directamente en el notebook
output_notebook()
# Convertir la columna vcr_fecha a datetime
info_df['vcr_fecha'] = pd.to_datetime(info_df['vcr_fecha'])
# Creamos una nueva columna que solo contiene el año y el mes
info_df['year_month'] = info_df['vcr_fecha'].dt.to_period('M')
# Agrupar por año y mes, y calcular el total de ventas
ventas_por_mes = info_df.groupby('year_month')['total'].sum().reset_index()
ventas_por_mes['year_month'] = ventas_por_mes['year_month'].dt.to_timestamp()
# Crear una fuente de datos para Bokeh
source = ColumnDataSource(ventas_por_mes)
# Crear el gráfico de líneas
p = figure(height=400, width=800, title="Total de Ventas por Mes", # Dimensiones y título especificados
x_axis_type='datetime', toolbar_location=None, tools="") # Eje x como tipo datetime y desactivamos las herramientas predeterminadas
# Creamos una herramienta de hover para mostrar tooltips al pasar el cursor sobre los datos
hover = HoverTool(
tooltips=[
("Mes", "@year_month{%F}"), # Tooltip para mostrar el mes en formato de fecha completa
("Total Ventas", "@total{0.00}") # Total de ventas con dos decimales
],
formatters={
'@year_month': 'datetime', # '@year_month' debe formatearse como una fecha
}
)
# Añadimos la herramienta de hover a la figura
p.add_tools(hover)
# Dibujamos una línea que conecta los puntos de datos
p.line(x='year_month', y='total', source=source, line_width=2, color='navy')
# Dibujamos círculos en cada punto de datos para resaltarlos
p.circle(x='year_month', y='total', size=5, source=source, fill_color="white", color='navy')
# Etiqueta para el eje x, y
p.xaxis.axis_label = 'Mes'
p.yaxis.axis_label = 'Total de Ventas'
# Mostramos la figura en el notebook
show(p)
## bokeh ->Facturas anuladas por mes
# Convertir la columna vcr_fecha a datetime
info_df['vcr_fecha'] = pd.to_datetime(info_df['vcr_fecha'])
# Crear una columna para el año y mes
info_df['year_month'] = info_df['vcr_fecha'].dt.to_period('M')
# Contar las facturas anuladas y no anuladas por mes
facturas_anuladas_por_mes = info_df.groupby(['year_month', 'fact_anul'])['fact_anul'].count().unstack(fill_value=0).reset_index()
# Convertir los períodos a cadenas de texto para el rango en el eje x
facturas_anuladas_por_mes['year_month'] = facturas_anuladas_por_mes['year_month'].astype(str)
# Crear una fuente de datos para Bokeh
source = ColumnDataSource(facturas_anuladas_por_mes)
# Crear el gráfico de barras
p = figure(height=400, width=800, title="Facturas Anuladas por Mes",
x_range=facturas_anuladas_por_mes['year_month'], toolbar_location=None, tools="")
# Añadir las barras de facturas anuladas y no anuladas
p.vbar_stack(['SI', 'NO'], x='year_month', width=0.9, color=['red', 'green'], source=source,
legend_label=['Anuladas', 'No Anuladas'])
# Configurar los detalles del gráfico
p.xaxis.axis_label = 'Mes'
p.yaxis.axis_label = 'Número de Facturas'
p.legend.title = 'Estado de la Factura'
# Mostrar la grafica
show(p)
# GUardar csv resultante
info_df.to_csv('data/salesForCustomerResult.csv', sep=';', index=False)